73 research outputs found
Deep Serial Number: Computational Watermarking for DNN Intellectual Property Protection
In this paper, we introduce DSN (Deep Serial Number), a new watermarking
approach that can prevent the stolen model from being deployed by unauthorized
parties. Recently, watermarking in DNNs has emerged as a new research direction
for owners to claim ownership of DNN models. However, the verification schemes
of existing watermarking approaches are vulnerable to various watermark
attacks. Different from existing work that embeds identification information
into DNNs, we explore a new DNN Intellectual Property Protection mechanism that
can prevent adversaries from deploying the stolen deep neural networks.
Motivated by the success of serial number in protecting conventional software
IP, we introduce the first attempt to embed a serial number into DNNs.
Specifically, the proposed DSN is implemented in the knowledge distillation
framework, where a private teacher DNN is first trained, then its knowledge is
distilled and transferred to a series of customized student DNNs. During the
distillation process, each customer DNN is augmented with a unique serial
number, i.e., an encrypted 0/1 bit trigger pattern. Customer DNN works properly
only when a potential customer enters the valid serial number. The embedded
serial number could be used as a strong watermark for ownership verification.
Experiments on various applications indicate that DSN is effective in terms of
preventing unauthorized application while not sacrificing the original DNN
performance. The experimental analysis further shows that DSN is resistant to
different categories of attacks
Deep Neural Networks Explainability: Algorithms and Applications
Deep neural networks (DNNs) are progressing at an astounding rate, and these models have a wide range of real-world applications, such as movie recommendations of Netflix, neural machine translation of Google, speech recognition of Amazon Alexa. Despite the successes, DNNs have their own limitations and drawbacks. The most significant one is the lack of transparency behind their behaviors, which leaves users with little understanding of how particular decisions are made by these models. Consider, for instance, an advanced self-driving car equipped with various DNN algorithms doesn't brake or decelerate when confronting a stopped firetruck. This unexpected behavior may frustrate and confuse users, making them wonder why. Even worse, the wrong decisions could cause severe consequences if the car is driving at highway speeds and might finally crash the firetruck. The concerns about the black-box nature of complex deep neural network models have hampered their further applications in our society, especially in those critical decision-making domains like self-driving cars. In this dissertation, we investigate the following three research questions: How can we provide explanations for pre-trained DNN models so as to provide insights into their decision making process? How can we make use of explanations to enhance the generalization ability of DNN models? And how can we employ explanations to promote the fairness of DNN models?
To address the first research question, we explore the explainability of two standard DNN architectures, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We propose to investigate a guided feature inversion framework for taking advantage of the deep architectures towards effective interpretation for CNN models. The proposed framework not only determines the contribution of each feature in the input but also provides insights into the decision-making process of CNN models. By further interacting with the neuron of the target category at the output layer of the CNN, we enforce the interpretation result to be class-discriminative. Besides, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into the additive contribution of each word in the input text. This additive decomposition enables REAT to further obtain phrase-level attribution scores. In addition, REAT is generally applicable to various RNN architectures, including GRU, LSTM and their bidirectional versions. Experimental results over a series of image and text classification benchmarks demonstrate the faithfulness and interpretability of the proposed two explanation methods.
To address the second research question, we make use of explainability as a debugging tool to examine the vulnerability and failure reasons of DNNs, which further lead to insights that can be used to enhance the generalization ability of DNN models. We propose CREX, which encourages DNN models to focus more on evidence that actually matters for the task at hand, and to avoid overfitting to data-dependent bias and artifacts. Specifically, CREX regularizes the training process of DNNs with rationales, i.e., a subset of features highlighted by domain experts as justifications for predictions, to enforce DNNs to generate local explanations that conform with expert rationales. Besides, recent studies indicate that BERT-based natural language understanding models are prone to rely on shortcut features for prediction. Explainability based observations are employed to formulate a measurement which can quantify the shortcut degree of each training sample. Based on this shortcut measurement, we propose a shortcut mitigation framework LTGR, to suppress the model from making overconfident predictions for samples with large shortcut degree. Experimental analysis over several text benchmark datasets validate that our CREX and LTGR framework could effectively increase the generalization ability of DNN models.
In terms of the third research question, explainability based analysis indicates that DNN models trained with standard cross entropy loss tend to capture the spurious correlation between fairness sensitive information in encoder representations with specific class labels. We propose a new mitigation technique, namely RNF, that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance
A Theoretical Approach to Characterize the Accuracy-Fairness Trade-off Pareto Frontier
While the accuracy-fairness trade-off has been frequently observed in the
literature of fair machine learning, rigorous theoretical analyses have been
scarce. To demystify this long-standing challenge, this work seeks to develop a
theoretical framework by characterizing the shape of the accuracy-fairness
trade-off Pareto frontier (FairFrontier), determined by a set of all optimal
Pareto classifiers that no other classifiers can dominate. Specifically, we
first demonstrate the existence of the trade-off in real-world scenarios and
then propose four potential categories to characterize the important properties
of the accuracy-fairness Pareto frontier. For each category, we identify the
necessary conditions that lead to corresponding trade-offs. Experimental
results on synthetic data suggest insightful findings of the proposed
framework: (1) When sensitive attributes can be fully interpreted by
non-sensitive attributes, FairFrontier is mostly continuous. (2) Accuracy can
suffer a \textit{sharp} decline when over-pursuing fairness. (3) Eliminate the
trade-off via a two-step streamlined approach. The proposed research enables an
in-depth understanding of the accuracy-fairness trade-off, pushing current fair
machine-learning research to a new frontier
- …